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ABSTRACT 

We test the reliability of a method to measure the mean halo mass of absorption line systems 
such as damped Lya absorbers (DLAs). The method is based on measuring the ratio of the cross- 
correlation between DLAs and galaxies to the autocorrelation of the galaxies themselves, which is (in 
linear theory) the ratio of their bias factor b. We show that the ratio of the projected cross- and 
autocorrelation functions {wd g (rg)/w gg (rg)) is also the ratio of their bias factor irrespective of the 
galaxy distribution, provided that one uses the same galaxies for Wdg{re) and w gg (rg). Thus, the 
method requires only multi-band imaging of DLA fields, and is applicable to all redshifts. Here, we 
focus on z = 3 DLAs. We demonstrate that the cross-correlation method robustly constrains the 
mean DLA halo mass using smoothed particle hydrodynamics (SPH) cosmological simulations that 
resolve DLAs and galaxies in halos of mass > 5.2 x 10 10 M Q . If we use the bias formalism of 
Mo & White (2002) with the DLA and galaxy mass distributions of these simulations, we predict 
an amplitude ratio Wd g /w gg of 0.771. Direct measurement of these correlation functions from the 
simulations yields Wd g /w gg = 6dla / ^gai = 0.73 ±0.08, in excellent agreement with that prediction. 
Equivalcntly, inverting the measured correlation ratio to infer the (logarithmically) averaged DLA 
halo mass yields (log Mdla (M© ) ) = 11. lS^;^, in excellent agreement with the true value in the 
simulations: (logMrjLA) = 11-16 is the probability weighted mean mass of the DLA host halos in 
the simulations. The cross-correlation method thus appear to yield a robust estimate of the average 
host halo mass even though the DLAs and the galaxies occupy a broad mass spectrum of halos, and 
massive halos contain multiple galaxies with DLAs. If we consider subsets of the simulated galaxies 
with high star formation rates (representing Lyman break galaxies [LBGs]), then both correlations are 
higher, but their ratio still implies the same DLA host mass, irrespective of the galaxy subsamplcs, 
i.e., the cross-correlation technique is also reliable. The inferred mean DLA halo mass, (log Mdla) = 
11.13^0 13) is an upper limit since the simulations do not resolve halos less massive than ~ 10 105 M . 
Thus, our results imply that the correlation length between DLAs and LBGs is predicted to be, at 
most, ~ 2.85 hr 1 Mpc given that z = 3 LBGs have a correlation length of tq ~ 4 hr 1 Mpc. While the 
small size of current observational samples does not allow strong conclusions, future measurements 
of this cross-correlation can definitively distinguish models in which many DLAs reside in low mass 
halos from those in which DLAs are massive disks occupying only high mass halos. 
Subject headings: cosmology: theory — galaxies: evolution — galaxies: high-redshift — quasars: 
absorption lines 



1. INTRODUCTION 



Damped Lya absorbers (DLAs), which cause the 
strongest absorption lines found in quasar spectra, have 
neutral hydrogen (Hi) column densities greater than 
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2 x 10 20 cm~ 2 . Their integrated column density dis- 
tribution implies that DLAs cont ain the largest reser- 
voir of Hi at high redshifts (e.g. , iLanze tta, et alJ 119911 
Il995t lEllison et al1l200lt iPeroux et al.ll2003j) . Thev con- 
tain more Hi than all the absorption line clouds in the 
Lya forest combined; and in an fij^f = 1 universe, they 
contain as much hydrogen as the comovin g mass den- 
sity of stars in disk galaxies today. This led I Wolfe et al.l 
( 1986) to put forward the hypothesis that DLAs are 
large, thick gaseous disk galaxies. This hypothesis has 
been debated since. On one hand, absorption-line ve- 
locity profiles of low-ionization species of DLAs seem to 
be consistent with those expected from li nes of sight 
intercepting rotating thick gaseous disks iWolfe et al.l 
Il99l IProchaska & Wolfe! |1997ri ILedoux et alJ I1998JL 
Prochaska & Wolfe ( 1997b) argue that the most likely 
rotation velocity is ~ 225 km s , i.e., that DLAs are 
typically as massiv e (10 12 M^) as L*(z = 0) galaxies . 
On the other hand IMcDonald fc Miralda-Escudl l|1999fi 
and lHaehnelt et alJ <|1998lT2l)0G7i ~nave shown that a large 
range of structures and morphologies, rather than a sin- 
gle uniform type of galaxy, can account for the observed 
DLA kinematics. At least at lo w redshifts (z < 1), 
this is supported by observations <lLe Brun et al.lll997t 
iKulkarni et alJl200oHRao k. Turnshekll2000|) . 

Early predictions of DL A properties were made using 
cosmological simulations ijKatz et al.lll996bl) and semi- 
analy tical si mulations of galaxy fo rmation ijKauffmannl 
1996) . Then.lG ^ rdner et all l)1997af) extended the results 
of lKatz et alJ l)1996bfl to predict the DLA statistics (e.g., 
diV/dz) accounting for the limited resolution of those 
simulations. They developed a semianalytical method to 
correct the numerical predictions for the contribution of 
unresolved low-mass halos and found that roughly half 
of these systems reside in halos with circular velocities 
V c > 100 km s -1 , and half in halos with 35km s -1 < 
V c < 100 km s" 1 . Interestingly. iGardner et all l|1997bfi 
found that "a CDM model with flo — 0.4, fl\ = 0.6 gives 
an acceptable fit to the observed absorption statistics," 
whereas other models did not match t he ob servations so 
well. More recently, IGardner et alJ (|200 If) found that 
there was an anti-correlation between the absorber cross 
section and the projected distance to the nearest galaxy, 
and that DLAs arise out to 10-15 kpc. Indeed, they 
found that the mean cross section for DLA absorption 
is much larger than what one would estimate based on 
the collapse of the baryons into a centrifugally supported 
disk. To match the observed DLA abundances, they re- 
quired an extrapolation of the mass function to small 
halos down to a cut-off of V c = 50-80 km s -1 . 

Other work such as tha t of IMo et all fl999), 
lHaehnelt et~aTT (119981 l2000|) . iNagamine et all (|2004j) . 
and lOkoshi et alJ l)2004|L indicates that DLAs are mostly 
faint (sub-L* =0 ) galaxies in small dark matter halos 
with V c <C 100 km s . However, the exact fraction 
of DLAs in such halos is a strong func ti on of resolu- 
tion, a s sh own bv I N a gamine et all l)2004[) . iFvnbo et al.l 
(1999) and iSchavel l)200lh used cross section arg uments 
and r eached similar conclusions. For instance, ISchavel 
l)2001[) argued that the observed Lyman break galaxy 
(LBG) number density alone (n = 0.016 h 3 Mpc -3 
down to 0.1 L*) can account for all DLA absorptions 
at z ~ 3 if the cross section for DLA absorption is 
irr 2 with r — 19 hr x kpc, much larger than the lumi- 



nous parts of most LBGs ijLowenthal et al.lll997j) . How- 
ever, ISchavel (|2001) pointed out that the cross section 
can be much smaller than this, if a fraction of DLA 
systems arise in outflows or if n is much larger (i.e., 
there are many LBGs or other galaxies not yet de- 
tected ) . In the semianalytical models of iMaller et alJ 
(2000), DLAs arise from the combined effects of massive 
central galaxies and a number of smaller satellites within 
100 hr 1 kpc in virialized halos. From all these studies, it 
appears that the low-mass hyp othesis is favored a gainst 
the thick gaseous disk model oflWolfe et all (fl98^. 11995^ 
and IProchaska fc Wolfel l)1997af) . A strong constraint on 
the nature of DLA will come from a measure of the typ- 
ical DLA halo mass. 

In order t o constrain the mass of z ~ 3 DLAs. sev- 
eral groups l)Gawiser §\ al.||2QQl|; |Ao[elberger et alJl2003l 
hereafter ASSP0 3: iBouche fc Lowenthall 120031 120041 
[hereafter BL04]; iBouchel 120031: J. Cooke et al., 2004, 
private communication) are using Lyman break galax- 
ies (LBGs) as large-scale structure tracers to measure 
the DLA-LBG cross-correlation, given that in hierarchi- 
cal galaxy formation models, different DLA masses will 
lead to different clustering properties with the galax- 
ies around them. Specifically, the DLA-galaxy cross- 
correlation yields a measurement of the dark matter halo 
mass associated with DLAs relative to that of the galax- 
ies. In particular, if the galaxies are less (more) cor- 
related with the DLAs than with themselves, this will 
imply that the halos of DLAs are less (more) massive 
than those of the galaxies. 

The purpose of this paper is to use cosmological sim- 
ulations in order to demonstrate that cross-correlation 
techniques will uniquely constrain the mean DLA halo 
mass, and to compare the results with observations. The 
advantage of using cosmological simulations is that one 
can check the reliability of the clustering results given 
that the mean halo mass of any population is a known 
quantity in the simulations. As we show, we find that the 
DLA-galaxy cross-correlation implies a mean DLA halo 
mass (logarithmically averaged) of (logMDLA(M )) ~ 
ll-13±g:|| close to the (logM DLA (M©)) = 11.16 ex- 
pected from the DLA halo mass distribution. The 
method is generally applicable to any redshifts, but we 
focus here on z — 3. 

Section presents the numerical simulations used in 
this paper. Section |3| lays the foundations of our cluster- 
ing analysis. Our results are presented in section0]along 
with a comparison to current observational results. A 
discussion of the implications of our results is presented 
in section [5j 

2. SIMULATIONS 

We use the TreeS PH simulat i ons o f iKatz et all 
()1996a|) parallelized bv iDave et all l)1997|) . which corn- 
bine smoothed particle hydrodynamics fSPH: lLiicvll977[) 
with the tree algorithm for c omputation of the gravita- 
tional force l)Herna uistl 1 1 987|) . This formulation is com- 
pletely Lagrangian, i.e., it follows each particle in space 
and time. The simulations include dark matter, gas, and 
stars. Dark matter particles are collision-less and influ- 
enced only by gravity, while gas particles are influenced 
by pressure gradients and shocks in addition to grav- 
ity, and can cool radiatively. Gas particles are trans- 
formed into collision-less stars when the following condi- 
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tions are met: the local density reaches a certain thresh- 
old (uh > 0.1 cm" 3 ), and the particles are colder than 
a threshold temperature (T < 30, 000 K) and are part of 
a Jeans unstab le convergent fl o w (se e IKatz et al.l ll996a 
for details). A iMiller & Scalol l)1979f) initial mass func- 
tion of stars is assumed. Stars of mass greater than 
8 M Q become supernovae and inject 10 51 erg s _1 of pure 
thermal energy into neighboring gas particles. Thus, 
the star formation rate (SFR) is known for each galaxy. 
Photo-ionization by a sp atially uniform UV background 
ijHaardt k, Madaul l'1996') is included. 

The simulation was run from redshift z — 49 to red- 
shift z = with the following cosmological parameters: 
n M = 0.4, n A = 0.6, h = ff /(100 km s" 1 Mpc" 1 ) = 
0.65, Qb = 0.02 h~ 2 , a primordial power spectrum in- 
dex n — 0.93, and erg = 0.8 for the amplitude of 
mass fluctuations. In this paper, we use the z — 3 out- 
put. The simulation has 128 3 dark matter particles and 
the same number of gas particles in a periodic box of 
22.222 hr 1 Mpc (comoving) on a side with a gravitational 
softening length of 3.5 h~ 1 kpc (Plummer equivalent). 
The mass of a dark matter particle is 8.2 x 10 8 M , and 
the mass of a baryonic particle is 1.09 x 10 s M . We iden- 
tify dark matter halos by u sing a "friends-of-friends" al- 
gorithm ijDavis et alll985]) with a linking length of 0.173 
times the mean interparticle separation. There are 1770 
resolved dark matter halos with a minimum of 64 dark 
matter particles (5.2 x 10 10 M ). 

We use the group finding a lgorithm "spline k ernel in- 
terpolative denmax" (SKID; IKatz et aTlll996a|) to find 
galaxies in the s imulations. We refer the reader to 
IKeres et al.l l|2004|) for a detailed discussion of the SKID 
algorithm. There are 651 galaxies resolved with a min- 
imum of 64 SPH particles (or 6.9 x 10 9 M ). Figure ^ 
shows the SFR as a function of total halo mass (dark 
matter + baryons; left) and baryonic mass (right) for the 
651 SKID-identified galaxies. The line shows the running 
mean (in logM) with a decreasing SFR threshold. 

The rest-UV spectra and colors of observed LBGs 

are dominated by the light fro m massive stars 

l)Lowenthal et alJ 119971 IPettini et alJ 1200 To simu- 
late various "flux-simulated" LBG samples in the sim- 
ulations, we selected six subsamples of galaxies accord- 
ing to their SFR, consisting of the 7, 25, 50, 100, 200, 
and 400 most star-forming galaxies. The correspond- 
ing SFR thresholds and mean masses (logM) for each 
of the subsamples are marked with the filled circles in 
Figure ^ (left) labeled 1-6. Naturally, real LBGs are 
color-selecte d, so this SFR sele ction can only be an ap- 
proximation. iDave et alJ l)1999|) discuss the properties of 
LBGs in numerical simulations similar to this one. 

We select DLAs from the simulations as follows. We 
compute the Hi column density (iVjjj) from the gas den- 
sity projected onto a uniform grid with 4096 2 pixels, each 
5.43 kpc comoving (or 2 kpc physical) in size, correspond- 
ing to the smoothing length. Each gas particle is pro- 
jected onto the grid in correct proportions to the pixel(s) 
it subtends given its smoothing length. Since DLAs oc- 
cur in dense regions, however, the smoothing lengths are 
typically equal to or smaller than the pixel size. We first 
assume that the gas is optically thin, and then correct 
the column densities for the ion ization background using 
a self-shielding correction, as in lKatz et al.l JW96b). The 



Hi column density projected along the x-axis is shown 
in Figure |2I a). A pixel is selected as a DLA from the 
column density map if is greater than 10 20 3 cm" 2 . 
There are approximately 115,000 pixels that meet this 
criterion, shown in Figure Hffe). We assume that each 
such pixel is a potential DLA. Figure^c) shows the po- 
sitions of the 651 galaxies that have a baryonic mass Mf, 
larger than the resolution 6.8 x 10 9 M . Figure 
shows the positions of the 100 galaxies with the highest 
star formation rate, and the positions of the simulated 
LBGs as red crosses. From Figure |3 one can already see 
that the galaxies and the DLAs are correlated. 

The left panel of Figure [3] shows the mass proba- 
bility distribution of all the resolved galaxies. The 
line shows the halo mass distribu tion obtained from 
the Press-Schechter (PS) formalism <|Mo fc Whitel2002|) . 
The mean mass (logarithmic average) for all the 651 
galaxies is shown ((log M h (M Q )) = 11.57). The right 
panel of Figure |3| shows the DLA halo mass distribu- 
tion. The halo mass of a given DLA was obtained by 
matching the projected DLA positions (2-D) with those 
of the resolved halos. The projected distance distribu- 
tion (between halos and DLAs) peaks at 8 kpc, with 
a tail to ^20 kpc (physical units; see also Gardner et 
al. 2001), and there is very little ambiguity in iden- 
tifying the halo of a DLA. Practically all the DLAs 
reside in halos with more than 64 dark matter parti- 
cles. Note that, at z = 0, the DLA distribution ap- 
pears to be broadly peaked at around V TOt = 110 km s" 1 
( Zwa an et alJl2005|) and is even broader with respect to 
luminosity l)Rosenberg fc Schneiderir2003|) . 

As mentioned, the purpose of this paper is to show 
that cross-correlation techniques will uniquely constrain 
the mean of this distribution, but will not c o nstra in its 
shape. We refer t he re ader to iGardner" eTallpOOl and 
INagamine et al.l (|2004) for a detailed discussion of the 
DLA halo mass distribution in numerical simulations. 
Typically, in order to match the observed DLA statistics, 
they require an extrapolation of the DLA mass function 
below the mass resolution. Here we make no attempt to 
include halos smaller than our resolution since it would 
require putting in the appropriate cross-correlation sig- 
nal by hand for halos smaller than our resolution. 

3. CORRELATION FUNCTIONS IN HIERARCHICAL 
MODELS 

In this section, we describe the fundamental clustering 
relations necessary to understand how one can determine 
the halo mass of DLAs. 

A widely used statistic to measure the clustering of 
galaxies is the galaxy autocorrelation function £ gg (r). 
Similarly, one can define the cross-correlation £d g be- 
tween DLAs and galaxies from the conditional proba- 
bility of finding a galaxy in a volume dV^ at a distance 
r = \tx — r |, given that there is a DLA at r D , 

P(LBG\DLA) = n u (l + &g(r))dV, (1) 
where n u is the unconditional background galaxy density, 
i.e., the density when £ = 0. 

At a given redshift, the autocorrelation and cross- 
correlation functions are related to the dark matter cor- 
relation function £dm through the mean bias b(M) by 

^ e (r)=f(M eal )^ M (r), (2) 
&g(r) =6(M D la) b(M sa x) &u(r), (3) 
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log [M h (M )] log [M b (M )] 

Fig. 1. — Left: SFR as a function of halo mass (DM + baryons) Mh- The streak of points at 10 13 Mq corresponds to several resolved 
galaxies. The line shows the running mean (in log M) with a decreasing SFR threshold. The filled circles show the SFR threshold vs. the 
mean mass ((logM)) of the six subsamples. The six subsamples are the 7, 25, 50, 100, 200, and 400 most star-forming galaxies, labeled 
1-6. Right: SFR as a function of the baryonic mass Mf,. 



where M ga i is the mean galaxy halo mass, Mdla is the 
mean DLA halo mass, and b(M) is given by 

/ p(M') b(M') AM' , (4) 

J M 

where p(M) is the halo mass probability distribution 
and b(M) is the bias function, which c an be computed 
using the ext ended PS formalism (e.g.. IMo et al.lll99^: 
IMo fc Whitel 12002). Thus, from equations HC2 if both 
£dg and £ gg are power laws (£ oc r 7 ) with the same slope 
7, the amplitude ratio of the cross- to autocorrelation is 
a measurement of the bias ratio 6(Mdla) / 'b(M sa {) , from 
which one can infer the halo masses M-£>i,A./M sa \. The 
details are presented in section |4~21 Briefly, given that 
b(M) is a monotonic increasing function of M, if £dg is 
greater (smaller) than £ gg , then the halos of DLAs are 
more (less) massive than those of the galaxies. 

In the remainder of this work, we use only projected 
correlation functions, w(rg), where rg = Da(1+z)9 in co- 
moving Mpc, where Da is the angular diameter distance. 
This is necessary since (1) the gas column density distri- 
bution is a 2-D quantity, and (2) this corresponds to the 
situation when one relies o n photometric redshifts (e.g., 
iBouche fc Lowenthal20 03. BL04). Projected correlation 
functions w(r$) is directly related to spatial correlation 
fun ctions £(r) if the selec tion function is known. F ollow- 
ing lPhillinns et all ll 19781) and lBudavari et al] <|2003fl . the 
projected autocorrelation function, w gg , of galaxies with 
a redshift distribution dN/dz is 

w sg (rg) = J dz (-) g(z)- 1 x (f(z)6) 1 ^ f< H, , 



where dN /dl is the galaxy redshift distribution in phys- 
ical units, f(z) — Da(1 + z) is the comoving line-of- 
sight distance, g{z) — dr/dz — c/H(z) 7 and H-y — 
r(l/2)r([ 7 - l]/2)/r( 7 /2) (see Appendix EJ). The pro- 
jected cross-correlation Wd g between a given absorber at 
a given redshift and the galaxies (with a distribution 
dN/dz) is 

™ dg M = / ^^rl + P)dl. (6) 

For galaxies distributed in a top-hat redshift dis- 
tribution dNdl of width W z [normalized such that 
(dNdl)dl — 1], as in the case here, Equations and 
imply that the amplitudes of both Wdg(re) and Wdgirg) 
are inversely proportional to W z (see Appendix^for the 
derivations): 

w m {r e ) c (re) 1 ^ r^ gg ff 7 x (J^j W z , (7) 

yj^^iref^rl^H.y.^-. (8) 

Therefore, the ratio of the amplitudes of the two 
projected correlation functions u>dg to u> gg is simply 
(ro,dgAo,gg) 7 , or the bias ratio 6(M DL A)/6(M gal ), from 
which we infer the mean DLA halo mass, regardless of the 
redshift distribution. This is an important result for sur- 
veys that rely on photometric redshifts: the ratio of the 
projected correlations is a true measure of the bias ratio, 
regardless of contamination or uncertainty in the actual 
redshift distribution, provided that the same galaxies are 
used for Wdg and w gg . 

4. RESULTS 

In section l4~Tl we quantify the amplitude of the DLA- 
galaxy cross-correlation relative to the galaxy-galaxy au- 
tocorrelation in the SPH simulations. We show how to 
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Fig. 2. — (a) Column density map of Hi in the 22.222 h 3 Mpc 3 volume projected along the x-axis on a 4096 2 pixel grid. Potential 
DLAs with Afjjj > 10 20 3 cm -2 appear black, (b) Position of potential DLAs projected along the x-axis. (c) Position of the 651 galaxies 
that have a baryonic mass Mj larger than the resolution 6.8 X 10 9 Mq. (d) Position of the 100 most star-forming galaxies. The red crosses 
show the positions of the seven most star-forming galaxies. 



invert the cross-correlation results into a mass constraint 
in section 14.21 We show that this method is indepen- 
dent of the galaxy sample that one uses (S 14. 3|) . Finally, 
we compare these results to observational results in sec- 
tion |0| 

4.1. DLA-Galaxy Cross- Correlation 

The filled circles in Figure 0] show the DLA-galaxy 
cross-correlation i«dg using the entire sample of 115,000 
DLAs and the 651 resolved galaxies. We computed 
Wdg(fg) with the estimator 

where N b s (rg) is the observed number of galaxies be- 
tween rg — dr/2 and rg + dr/2 from a DLA and N exp (rg) 



is the expected number of galaxies if they were uniformly 
distributed, i.e., N cxp (rg) = 2irrgY* g dr where E 9 is the 
galaxy surface density. () denotes the average over the 
number of selected DLAs (A^dla)- In counting the pairs, 
we took into account the periodic boundary conditions 
of the simulations. 

There are sever al reasons not t o use other estima- 
tors such as the lLandv fe Szalavl (1993, LS) estima- 
tor. First, we want to duplicate as closely as possible 
the method (and estimator) used in the observations of 
iBouche fc Lowenthall l)2003li and BL04. But, more im- 
portantly, the LS estimator is symmetric under the ex- 
change of the galaxies with the absorbers, whereas here 
and for the observations of BL04, the symmetry is bro- 
ken. This is due to the absorber redshift being well 
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log [M h (M )] log[M DLA (M )] 

Fig. 3. — Left: Halo mass (DM + baryons) probability function p(M) of all the SKID-identified galaxies with baryonic masses larger 
than the resolution 6.8 X 10 9 Mq, corr esponding to 6 4 SPH particles. For comparison, the curve is the dark matter mass function from the 
extended Press-Schechter formalism of Mo & White 12002) in this cosmology, scaled arbitrarily (i.e., not a fit). Right: DLA dark matter 
mass distribution, Pdla(^) (°c dzd d lo ^, M )■ This was found by matching the 2-D DLA positions with the nearest resolved halo. The shape 
of the distribution will not be constrained by the DLA-galaxy cross-correlation, but its first moment ((log M)) will be. 
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Fig. 4. — Filled circles show the projected DLA-galaxy cross- 
correlation uidg(rfl) at z = 3 in this 22.222 h~ 3 Mpc 3 simulation. 
The solid triangles show the projected autocorrelation w gg (rg) (off- 
set by 0.03dex in the x-axis for clarity). The full sample of 115,000 
DLAs and 651 resolved galaxies was used. The amplitude ratio 
a = 0.73 ± 0.08, which is Wdg/u'gg; is found by fitting u> dg to the 
model to dg = a w gg , where w gg is the fit to the galaxy-galaxy au- 
tocorrelation (dashed line). The small panel shows the la range 
(vertical lines) of the \ 2 distribution (solid line). An integral con- 
straint of C = 0.04 was used. 



known, while galaxies have photometric redshifts with 
larger uncertainties and, therefore, are distributed along 
the line of sight. This broken symmetry is also funda- 
mental in the derivation of equations |SHH Had we used 



spectroscopic redshifts and £(r) instead of w(rg), the LS 
estimator would be superior. 

Given that we use the galaxy surface density E ff to 
estimate the unconditional galaxy density (see eq. [Q, 
the integral of S g (l + Wd s ) over the survey area A will 
be equal to the total number of galaxies, i.e., J A + 
Wdg)dA = N g . As a consequence, J S g Wd g dA = 0, 
and the correlation will be negative on the largest scales, 
i.e., biased low. This is the known "integral constraint". 
In the case of our 22.222 hr 2 Mpc 2 survey geometry, 
we estimated the integral constraint to be C — 0.04, or 
2% of the cross-correlation strength at 1 h^ 1 Mpc. We 
added C to Wdg estimated from equation [!|] 

The uncertainty to u>dg, u w , has two t erms, the Poisson 
noise and the clustering variance (see lEisensteirl 120031 
and references therein; Appendix lB|l . In Appendix iBl we 
show that <r w is proportional to 1 / V^dla fca. lBllj) . 

There are several ways to compute a w in practice. The 
proper way to compute a w would be to resample the 
DLAs, since this would include the uncertainty due to the 
finite the number of lines of sight. However, this is valid 
for independent lines of sight, as in the case of an obser- 
vational sample (provided that A^dla is large, say greater 
than 10), and will not be correct here given that we have 
only one simulation and that we have to use the same 
galaxies for each simulated line of sight. The uncertainty 
<r w must then reflect that we used only one realization of 
the large-scale structure. For this reas on, we elected to 
use the jackknife estimator (|Efronll982|) . i.e., by dividing 
the 22.222/i~ 2 Mpc 2 area into nine equal parts and each 
time leaving one part out. This will accurately reflect the 
uncertainty in u>d g due to the one large-scale structure 
used, but the signal-to-nois ratio (S/N= Wdg/<? w ) will not 
increase with V-^dla as expected (Appendix B): it will 
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saturate after a certain value of -ZVdla- We find that in- 
deed the SNR saturates at Nbla — 40 (not shown) . This 
is a major difference from observational samples, where 
each field is independent. In that case, equation IBlf I 
applies and the S/N is proportional to y/NuLA- 

We computed the full covariance matrix from the 
-^Vjack = 9 realizations as 

COV„- = V [w k {re^)-w{r gi )]-[w k {r e] )-w{r e] )] 

JVjack k=1 

(10) 

where w% is the fcth measurement of the cross-correlation 
and w is the average of the A/jack measurements of the 
cross-correlation. The error bars in Figure 0] show the 
diagonal elements of the covariance matrix, i.e., a w = 

Vcov~. 

We computed the projected autocorrelation w gg {rg) of 
the same simulated galaxies used for Wd g {rg) in a similar 
manner. We used the estimator shown in equation 0] 
to compute w gg {rg), where N b s (r) is now the number 
of galaxies between r — dr/2 and r + dr/2 from another 
galaxy. The open triangles in Figure0]show the projected 
autocorrelation w gg {rg) of the 651 galaxies. 

We fitted the galaxy autocorrelation with a power 
law model (w gg = A gg rg) by minimizing ol [w — 
w] T COV _1 [w — w], where w and w are the vector data 
and model, respectively, and COV -1 is the inverse of the 
covariance matrix. We used single value decomposition 
(SVD) techniques to invert the covariance matrix, COV, 
since it is singular and the inversion is unstable (see dis- 
cussion in Ber nsteirJll99l . 

We then use that fit as a template to constrain the 
amplitude of Wd g , i.e., 

w dg = a x w gg , (11) 

where a is the amplitude ratio Ad g /A gg of the correla- 
tion functions. This assumes that the two correlation 
functions have the same slope (see § 0J. This method 
also closely matches the method used by BL04 (see 
S l4.4l below1 and makes comparison to those observations 
straightforward . 

The solid line in Figure 0] shows the fit to Wd g using 
equation II II where the best amplitude a is 

a = 0.73 ±0.08. (12) 

The top panel shows the x 2 ( a ) distribution with the la 
range. In other words, the bias ratio 6(Mdla) /b{M ga ,i) 
is 0.73 ± 0.08. This can be converted into a correlation 
length for u>d g of a 1 / 18 ~ 0.84 times that of the galaxy 
autocorrelation, i.e., r^^g — 0-84 rn.py- 

Several authors (e.g., IBerlind fe Weinberg! I2002t 
iBerlind et al.l l2003l and references therein) have shown 
that the small scales (r (1 Mpc) of the correlation function 
are the scales sensitive to variations in the halo occupa- 
tion number. At those scales, £(r) is very susceptible 
to galaxy pairs that are in the same halo. Therefore, 
when we repeated our analysis with the six subsamples, 
we restricted ourselves to rg > 1 h^ 1 Mpc. In this case, 
for the full sample, we find the amplitude ratio to be 
a = 0.70 ± 0.18, in good agreement with eauationll2l 

The reader should not use these results (e.g., ea. 112(1 . 
obtained with 651 galaxies and 115,000 DLAs, to scale 
the errors to smaller samples, because we use the same 



large-scale structure for all the 115,000 simulated DLAs. 
As mentioned earlier, the large-scale structure dominates 
the uncertainty at large Adla, and this is seen in the fact 
that the S/N saturates after Adla — 40. We come back 
to this point at the end of § 14.41 

4.2. The Mass of DLA Halos from the Amplitude of u>d g 

EquationEO i- e -> the bias ratio b{M^A)/b{AI ga i), can 
' be converted into a mean halo mass for DLAs if one 
knows the functional form o f b(M) and M ra |, O ne can 
use the PS formalism (e.g., IMo fc White! 120021) or the 
autocorrelation of several galaxy subsamples to constrain 
the shape of b{M). We refer to these as the "theoretical 
method" and as the "empirical method," respectively. 

4.2.1. Theoretical Biases b{M) 

One can compute the theoretical biases for any pop- 
ulation (eq. Q and predict the bias ratio a priori if the 
mass probability distribution p{M) is known. Naturally, 
p{M) is known in our simulation (Fig. 01). We show that 
the predicted bias ratio is well within the ltr range of 
our results (ea. I12|l. demonstrating the reliability of the 
method. 

Given that galaxies and the DLAs actually lie in halos 
of different masses, the theoretical biases are found from 
equation 01 i.e., 

poo 

Wa(> M) = / pdla(M') KM 1 ) dlogM' , (13) 

J M 

r-OQ 

6gal(> M) = / Pgal(M') b(M') d log M 1 , (14) 
J M 

where p(M) is the appropriate mass distribution 
\p(M) = d ^ M normalized such that Jp(M)dlogA/ 

I and b(M) is the bias of halos of a given mass M. 
The bias function b(M) is also a function of redshift z, 
i.e., 6(M, z), and can be comp uted at a given z fr om the 
extended PS formalism Ce.g.. lMo fc Whiteil2002t . It is 
shown in Figgure0]for z = 3 on a linear-linear {left) and 
log-linear [right) plot. 

The mass distributions pdla and p ga i are shown in 
Figgure01 Because p{M) is bounded at some low-mass 
limit M m j n (due to limited resolution or to observational 
selection), the mean bias b of a given galaxy sample is 
defined by b ga i = b{> Mmin)- 

The predicted biases b are shown in the left panel of 
Figure 01 The predicted biases b for the subsamples, 
the 651 galaxies, and the DLAs are represented by open 
squares, the filled square and the filled circle, respec- 
tively. From b for the 651 galaxies {filled square) and for 
DLAs {filled circle) , the theoretical bias ratio &dla / ^gai 
is found to be 0.771, very close to the bias ratio measured 
from the clustering of galaxies around the DLAs (ea. ll2ll . 

When the distributions p{M) are not known, we can 
infer a mass ratio from the bias ratio using the approx- 
imation b{M) = bo + bi logM, 1 over a restricted mass 
range. In each panel in Figure 01 the dashed line shows 
such a linear fit over the mass range logM ~ 11 — 12.5. 

1 One can use b(M) = b' Q + M instead, and replace (log M) 
by (M) in the remaining of the discussion. 
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Fig. 5. — Left: The z = 3 bias b(M) as a function of halos of mass M from the extended Press-Schechter theory, as in Mo & White (2002). 
Right: Same as left panel in log M space. In both panels, the dashed line is a linear fit to the curve over the mass range log M ~ 11.5 — 12.5. 
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Fig. 6. — Left: The symbols show the mean bias b as a function of the mean halo mass (logAf^) for th e DLA s (circle), full galaxy 
sample (filled square) and the subsamples (open squares), labeled 1—6. The bias is calculated using equations H3lTT^l and the distributions 
Pdla an d Pg a i (shown in Fig. 0. From the filled symbols, the predicted bias ratio is found to be 0.771. The solid line is a linear fit 
to the points and is 5% away from to the linear approximation (Fig. |SJ shown by the thick dashed line. Right: Same as left panel, 
but using the empirical method (see text) instead of the mass distributions. The triangles show the mean bias for the different galaxy 
samples using b oc «/ A gg (eq. |5J where the normalization was adjusted to match the amplitude of the dashed line. The bias for the 651 

galaxies is represented by the filled triangle. The open triangles show b for the subsamples. The right axis scale shows the corresponding 
correlation length r*o. For the DLAs, the shaded areas show the measured bias ratio a = 0.73 ± 0.08 and the corresponding halo mass range 
(log Mdla(Mq)) = 11.13^q']J using the mean mass of the 651 resolved galaxies: (log M ga i(M0)) = 11.57. In both panels, the vertical 
dashed line shows the "true" mean (log Mdla(Mq)} = 11.16 obtained from the DLA mass distribution Pdla(A^) shown in FigurelBI Both 
panels are for z = 3. 



9 



Using this approximation, the mean bias b is given by 

POO 

b( M min ) = / p(M') b(M') d log M' 

poo 

= 60 + 61 / p(M') log M' d log M' 

= 60 + 61 (log M) , (15) 

where (log M) is the first moment of the distribu- 
tion p(M). Thus, the mean bias for the galaxies 
and the DLAs are 6 ga i = 6((logM ga i)), and 6dla = 
6((log Mdla)), respectively, where () denote the first mo- 
ment of the appropriate mass distribution. 

In Figure El (left) the solid line is a linear fit to the 
theoretical biases 6((logM)), and is 5% away (in am- 
plitude) from the linear approximation (eq. I15|l shown 
by the thick dashed line. The vertical dashed line indi- 
cates the mean DLA halo mass (log Mdla) that is found 
from the first moment of the mass distribution in Fig- 
ure [3J This shows that using a linear approximation of 
b(M) is equivalent to using the bias function b(M) from 
IMo fc White! (E.002), provided that the DLA-galaxy mass 
ratio is not larger than a decade. Indeed, the 5% differ- 
ence in amplitude cancels out when taking the bias ratio. 

4.2.2. Empirical Method for b(M) 

To infer (log Mdla) from equation ^] or from obser- 
vations, one needs to find the coefficients 6p and b\. To 
do so, one can either use the PS formalism ijMo &: White! 
2002) or use the fact that b is proportional to ^A gg 
(eq. |5J), where A gg is measured for each of the galaxy 
subsamples covering the mass range log M ~ 11.5 — 12.5. 
Figure El (right) illustrates this point. The thick dashed 
line is again the linear approximation shown in Figure El 
The open (filled) triangles show the mean biases 6 of 
the subsamples (full sample) assuming that b cx \J A gg 
(eq. EJ) • The normalization is set to match the dashed 
line, and is not relevant, since we measure a ratio of 
two biases. This shows that o ne can us e eit her the PS 
formalism (|Mo fc White! 120021) or use y/A^ to find the 
coefficients 60 and b\. 

In the case where the autocorrelation length 7"o igg has 
been determined, one can use the right y-axis scale of 
Figure|H|the infer the DLA halo mass from the measured 
bias ratio. 

4.2.3. The Mean DLA Halo Mass 

To actually determine (log Mdla) from our cross- 
correlation result (eq. I12f) . we used (1) the linear ap- 
proximation to the PS bias (Fig. El thick dashed line), 
and (2) (logM ga i) = 11.57 for the 651 galaxies. We in- 
fer a mean DLA halo mass of (log Mdla) = ll-13io'i3, 
shown by the vertical shaded area on the right panel of 
Figure El Our cross-correlation result (eq. I12|) is shown 
by the horizontal shaded area. The "true" DLA mass 
derived from pdla (Fig. an d equation is shown by 
the vertical dashed line at logAioLA = 11.16. Similarly, 
using fits to b(M) in linear space (Fig. El left), we infer 
(Mdla) = 2.12lJ q 6 x 10 11 M , close to "true" mean 
l/A^DLAE^Wa = 3.94 x 10 11 M . 

In summary, the amplitude of Wd g relative to u> gg , a = 
0.73 ± 0.08 fea. I12f> . measured in this simulation implies 



that DLAs have halos of (logarithmically) averaged mass 

(log Mdla(Mq)) =11.13t°;^, (16) 

close to the true 11.16. This shows that the cross- 
correlation technique uniquely constrain the mean of the 
halo mass distribution, despite the fact that DLAs oc- 
cupy a range of halo masses and some halos contain mul- 
tiple galaxies and multiple DLA systems. In § 14.31 we 
show that the technique is reliable in the sense that it 
will lead to the same answer regardless of the galaxy 
sample used. 

From the right panel of Figure El we can now predict 
the cross-correlation strength for real z = 3 LBGs, which 
have a correlation lengt h of ro igg ~ 4 Mpc (e.g., ASSP03; 
lAdelberger et al]|20 041 , corresponding to a halo mass of 
Mh ~ 10 12 Mq. From the figure, one expects that the 
correlation ratio or the bias ratio is ~ 1.75/3 = 0.58, 
and thus the DLA-LBG cross-correlation would have a 
correlation length r , dg = 4 x (0.58) 1 / 16 ~ 2.85 Mpc. 

Potential systematic errors include the few massive ha- 
los (Mh > 10 13 Mq) that are missed due to the limited 
volume (22.222 h~ 3 Mpc 3 ) of our simulation. However, 
since DLAs are cross section selected these few massive 
halos are too scarce to change the mean (logA/oLA) of 
the DLA mass probability distribution (Fig. El right). 
Naturally, if there were such massive halos in our simu- 
lations, the amplitude of the cross-correlation would be 
different. We address this point in a general way in § 14.31 
and show that the derived (log Mdla) is independent of 
the galaxy sample used. 

Our treatment of feedback is limited to energy injection 
of supe rnovae, and thus does n ot treat phenomena like 
winds. iNagamine et al.l l|2004fl included winds in sim- 
ilar simulations and showed that the DLA abundance 
decreases with increasing wind strength, but the mean 
DLA halo mass will be shifted towards high er mass in the 
presence of winds. INagamine et al.l l)2004|) also showed 
that the DLA abundance (extrapolated to 10 s M , i.e., 
below the resolution limit, using the PS formalism) also 
decreases with increasing resolution, but again, the mean 
DLA halo mass will be shifted toward higher mass in 
higher resolution runs. 

Given that (1) a fraction of DLAs are expected to 
arise in halos below our mass resolution of Mh > 5.2 x 
10 10 M and (2) our total DLA abundanc e extrapolated 
to 10 s M Q , as in INagamine et all (|2004fl . over-predicts 
the observed DLA abundance, equation 1161 i s an upper 
limit. Furthermore, given the results of INagamine et all 
( 2004) showed that both winds and better resolution in- 
crease the mean DLA halo mass, we conclude that a sim- 
ulation with SNe winds and with a better mass resolution 
w ould lower our mean DL A mass. Reading from Figure 5 
in INagamine et aT] <|2004f) . we estimate that (log Mdla) 
is 10.6 in their high-resolution run with strong winds, 
or a factor of ~ 5 smaller than here. Thus, equation 1161 
is an upper limit. 

4.3. The Cross- Correlation Is Independent of the 
Galaxy Sample 

From equations El and El we expect the relative am- 
plitude a to vary as a function of the halo mass of the 
galaxy sample Mh. We therefore performed the same 
cross-correlation calculations for each of the six subsam- 
ples presented in § El (see also Fig.0, and ask the ques- 
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Fig. 7. — DL A-LB G cross- to autocorrelation amplitude ratio, 
a (defined in eg. 1111 . as a function of the mean galaxy halo mass 
(logAf/j) using the full DLA sample of ~ 115,000 lines of sight. 
Note that a is also b(MuLA)/b(M ga _i). Each of the subsamples 
(see Fig. Q is labeled 1—6. The filled circle with solid error bars 
shows a for the entire sample of 651 galaxies and 115,000 DLAs. 
The filled circle with dotted error bars (offset along the x-axis) 
shows a fitted over all scales for the entire sample of 651 galaxies 
and 115,000 DLAs from which we infer a mean DLA halo mass of 
(log A/dla) = H-13^013- For this mass, the expected amplitude 
ratio a for the six subsamples is shown by the solid line. The 
expected a follows closely the values found for the subsamples, 
showing that one will get the same DLA halo mass for any galaxy 
subsample, as long as the same galaxies are used for w ge and w^ g , 
i.e., that the method is reliable and self-consistent. For comparison, 
the dashed lines show the expected a if the mean DLA halo mass 
(log Mdla) were 10.5, 11.5, and 12 (from bottom to top) instead. 



tion, is the inferred (Mdla) the same in each case? We 
restricted ourselves to scales rg > 1 h^ 1 Mpc (from the 
discussion in § 14. 1|) . 

Figure shows the measured amplitude or bias ratio 
a for each of the subsamples. The amplitude ratio a for 
the subsamples (full sample) is represented by the open 
squares (filled circle) with solid error bars. The filled 
circle with dotted error bars represents the full sample 
shown in Figure0]from which we inferred a = 0.73 ± 0.08 
and (log Mdla) = 11-13+q ; 13 . As expected, a increases 
with larger subsamples, or with decreasing galaxy halo 
mass Mh. 

For the method to be self-consistent, the derived DLA 
halo mass (log A/dla) should be the same for all the sub- 
samples. Given equation IT31 (logMoLA) = 11.13 deter- 
mined in § 14.21 and a mean galaxy halo mass (logM/j), 
we can predict the bias ratio a. The solid line in Fig- 
ure represents this prediction. One sees that the mea- 
sured bias ratio for the subsamples (open squares) fol- 
low the expected bias ratio (solid line). For comparison, 
the dashed lines show the expected amplitude ratios a 
if DLAs were in halos of mean mass (log Mdla) = 10.5, 
11.5, and 12 (from bottom to top) instead of the inferred 
(log Mdla) - 11.13. 

We conclude that the method is reliable and self- 
consistent, i.e., the mass (log Mdla) is independent of 
the galaxy sample used, that the clustering statistics of 
DLAs with galaxies can be used to infer their mass, and 
that large observational samples will shed new light on 
their nature. A direct observational measure of the rel- 



ative amplitude a (J. Cooke 2005, private communica- 
tion), will show whether o r not DLAs are massive disks 
(10 12 My) as proposed bvlWolfe et alJ 1H9M Il995h and 
IProchaska fc Wolfe! <ll997bh . 

4.4. Comparison to Observations 

In this section, we first briefly review past and recent 
observations of clustering between galaxies and DLAs 
(§ 14.4. 1(1 . We then (§ 14.4.2(1 focus on comparing the sim- 
ulated DLA-LBG cross-correlation to the observational 
results of BL04, in a meaningful way, i.e., with a sample 
of similar size. 

4.4.1. Observations of the DLA-Galaxy 
Cross-Correlation at z = 3 

Early attempts to detect diffuse Lya emission from 
DLAs at z > 2 usi ng deep narrow band imaging 
l|Lowenthal et alJll995|) did not reveal the absorber but 
unvei led a few companion Lya emitters l|Lowenthal et all 
1991), hinting a t the c l usterin g of galaxies around DLAs. 
This prompted iWolfei l(1993[) to calculate the two-point 
correlation function at (z) = 2.6 and to conclude that, 
indeed, Lya emitters are clustered near DLAs at the 
99% or greater confidence level. Some recent Lya 
searches have succ eeded in unveiling the absorber (e.g., 
iFynbo et al1ll999ft. 

iFrancis & Hewettl l|1993|) reported the discovery of 
super-clustering of sub-DLAs at z ~ 2.4 and 2.9: a 
total of four Hi clouds are seen in a QSO pair sepa- 
rated by 8', each being at the same velocity. Recent re- 
sults from narrow-band imaging of the Francis & Hewett 
field show that spectroscopically confirmed Lya emit- 
ters are clustered at the redshift of the strongest Hi 
cloud at z = 2.9 (l o gA^r = 20.9) towa rds Q2138-4427 
(|Fvnbo et al.ll2003j) . iRoche et all l|2000(l identified eight 
Lya emitting galaxies near the DLA at z = 2.3 towards 
PHL 957 in a ddition to the previous ly discovered Coup 
Fourre galaxy ijLowenthal et al.ll991|) , implying the pres- 
ence of a group, filament, or proto-cluster associated 
with th e DLA. Other eviden ce of clustering includes the 
work of lEllison et all (|2001h . who found that the DLA 
at z a bs = 3.37 towards Q0201+1120 is part of a con- 
centration of matter that includes at least four galaxies 
(including the DL A) over transverse scales greater than 
5 h^ 1 Mpc, and of iD'Odorico et all l|2002|) who showed 
that out of 10 DLAs in QSO pairs, five are matching 
systems within 1000 km s _1 . They concluded that this 
result indicates a highly significant over-density of strong 
absorption systems over separation lengths from ~ 1 to 
8 ft- 1 Mpc. 

iGawiser et alJ 1)200 1|) studied the cross-correlation of 
LBGs around one z ~ 4 DLA. Probably due to the high 
redshift of their DLA, IGawiser et all 1^20011) found that 
Wdg(rg) is consistent with 0, i.e., they found that the 
distribution of the eight galaxies in that field (with spec- 
troscopic redshifts) is indistinguishable from a random 
distribution. Their data did not allow them to put limits 
on the amplitude of iWdg- 

Recently, ASSP03 found a lack of galaxies near 
four DLAs and concluded that the DLA-LBG cross- 
correlation is significantly weaker than the LBG-LBG 
autocorrelation at the 90% confidence level. They 
found two LBGs within rg = 5.7 hr x Mpc and within 
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W z (0.0125 (~ 8 br 1 Mpc) whereas - 6 were expected 
if the cross-correlation has the same amplitude as the 
galaxy aut ocorrelation. B e cause of the field of view avail- 
able, both iGawiser et alJ {2001) and ASSP03 were not 
sensitive to scales larger than rg ~ 5 h^ 1 Mpc, which is 
important since the relevant scales to measure the DLA- 
LBG cross-correlation extend up to r e ~ 10 hr 1 Mpc. 

However, the results of ASSP03 can be used to put 
an upper limit on w,x g /w sg through the following steps: 
First, note that the two galaxies (in AdlAs = 4 fields) 
observed by ASSP03 give 

( A obs ) = 2/4 - 0.5 = ( A oxp ) (1 + l dg ) , (17) 
and the six galaxies expected if £ dg = £ gg give (A exp ) (1 + 

£ gg ) = 6/4, where £ is the volume average of the cor- 
relation function. Second, for the LBG autocorrelation 
published in ASSP03, we find £ gg ~ 1.1, averaged over 
a sphere centered on the DLAs with an effective radius 
of ~ 6 h~ x Mpc, i.e., with the same volume as the cylin- 
drical cell used by ASSP03. Thus, the expected number 
of galaxies per DLA field is (N cxp ) = 0.68 if £ dg = 0, 
and the total number of galaxies is 4 x (A cxp ) = 2.85. 
Clearly their measurement of two galaxies is consistent 
with no cross-correlation. From equation 1171 we can in- 
fer that 1 + C d g = 0.7 using (A oxp ) = 0.68. Third, the 
uncertainty to £ dg , a^, can be estimated using the re- 
sults shown in Appendix [H] The variance V(£) = c| 
is made of two terms, the shot noise variance V sn and 
the clustering variance V c \. The shot no ise v ariance to 
(A obs ) is V((N ohs )) sn = (A obs ) = 0.5 (eq.lE). The two- 
point clustering variance (eq. IB6|I is simply N (A^ gg ) = 

A 2 (2.50) where A = -h/Ki = 2.28. The three-point 
clustering variance (ea. lB7(l is ~ since £ dg — 0. Finally, 

from equation Eiil a £ = ^^=^^(1 + ^)^07 ~ 

1.06, and a 1-a (2-cr) upper limit to £ dg is £ dg + l(2)a^ = 
-0.3 + 1.06(2.12) = 0.76(1.82). Since £ gg = 1.1, the 1-a 
(2-cr) upper limit to the amplitude ratio is £ dg /£ gg ^ 
0.70(1.65), respectively. This rough calculation is quite 
consistent with Adelberger's results where it was found 
that £ dg < £ gg at the 90% confidence level using Monte 
Carlo simulations. 

Given that the relevant scales to measure the DLA- 
LBG cross-correlation exte nd up to rg ~ 10 h^ 1 Mpc, 
IBouche fc Lowenthall l)2003t) were able to first detect and 
measure a DLA-LBG cross-correlation signal (BL04) us- 
ing the wide-field (0.35 deg 2 or ~ 40 2 Mpc 2 comoving 
at redshift z = 3) imager MOSAIC on t he Kitt Peak 
4m telescope. IBouche fc Lowenthall {2003|) showed that 
there was an over-density of LBGs by a factor of ~ 3 
(with 95% confidence) around the z a bs — 3 DLA to- 
wards the quasar APM 08279+5255 \z cm = 3.91) on 
scales 2.5 < rg < 5 fe -1 Mp c. Extending the results of 
IBouche fc Lowenthall {2003) to three z ~ 3 DLA fields, 
BL04 probed the DLA-LBG cross-correlation on scales 
rg ~ 5-20 hr 1 Mpc and found (1) a DLA-LBG cross- 
correlation with a relative amplitude w dg = (1.62 ± 
1.32) u> gg that is greater than zero at the ~ 95% con- 
fidence level, and (2) that w dg is most significant on 
scales 5-10 Mpc. In other words, DLAs are clus- 
tered with LBGs, but unfortunately the sample size did 



not allow BL04 to test whether a is greater or smaller 
than 1. Soon, the ongoing survey of 9 z ~ 3 DLAs of 
iCooke et all i|2005|) will triple t he sample of BL04. 

In a slightly different context. fBouche et alJ l)2004[) ap- 
plied successfully the technique presented here to 212 
z ~ 0.5 Mgn systems (of which 50% are expected to 
be DLAs) using luminous red galaxies (LRGs) in the 
Sloan Digital Sky Survey Data Release 1. They found 
that the Mgn-LRG cross-correlation has an amplitude 
0.67 ± 0.09 times that of the LRG-LRG autocorrelation, 
over comoving scales up to rg = 13 hr x Mpc. Since 
LRGs have halo-masses greater than 3.5 x 10 12 M for 
Mr < —21, this relative amplitude implies that the 
Mgll host galaxies have halo masses greater than ~ 2- 
8 x 10 11 Mq. These results show how powerful the cross- 
correlation technique is. 

To summarize the current observational situation on 
the z = 3 DLA-LBG cross-correlation, ASSP03 finds that 
the amplitude ratio is £ dg /£ gg ;$ 0.70, and BL04 finds 
that £dg/£ g g ^ 0.30, both at the 1-a level. Using 
Monte Carlo simulations, ASSP03 finds £dg/£gg < 1-0, 
at the 90% confidence level, and BL04 finds £dg/£ gg > 
0.0, at the 95% confidence level. The DLA halo mass 
range allowed by these observations is still large: it covers 
logM DLA ~ 10-12 Mq. 2 

4.4.2. Simulation of Present Observations: u> dg with 
Small Samples 

There are many significant differences between the ob- 
servational sample of BL04 and the present simulated 
one. First, the shape of the volume is very different: 
the survey volume of BL04 is 40 x 40 x 100 h~ 3 Mpc 3 
(comoving), while these simulations are 22.222 h^ 1 Mpc 
(comoving) on a side. Given that the survey of BL04 con- 
tains about 80-120 LBGs per field, their observed LBG 
number density corresponds to about seven galaxies per 
22.222 h~ 3 Mpc 3 . Naturally, seven galaxies are not a fair 
sample of the LBG luminosity function. This is an inher- 
ent problem due to the size of the simulation, rendering 
the comparison between the observed and the simulated 
cross-correlation difficult. Second, as mentioned in § 
the simulated LBGs are selected according to their SFR, 
while the observed LBGs are color selected. Third, the 
same galaxies are used for every simulated line of sight. 
These differences limit our ability to perform a direct 
comparison to observations. 

With these caveats in mind, we can repeat our analysis 
of section |4. 21 in the limit of small Adla and with similar 
galaxy number densities. Because, to first order, a w oc 
(V^DLAAgai)^ 1 (eq. IBlljl . a sample made of 10 DLAs 

and 25 galaxies per 22.222 2 h~ 2 Mpc 2 "field" is expected 
to have similar errors to the sample of BL04 made of 
three DLAs and 100 galaxies per 40 2 h~ 2 Mpc 2 field. As 
for the full sample, we restricted ourselves to scales rg > 
1 hr 1 Mpc, which also corresponds to the most relevant 
scales 5-10 hr 1 Mpc of the observations of BL04. We 
find that the relative amplitude of the cross-correlation 

2 After this paper's submission, we learned that P. Monaco et 
al.(2005, private communication) constrained the halo mass of a 
few individual DLAs to be around 5 X 10 11 Mq. Their mass es- 
timates come from the emission-absorption redshift difference as a 
proxy for a rotation curve. 
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with 10 lines of sight and 25 galaxies is a = 0.77 ± 0.53, 
whereas BL04 found a = 1.62 ± 1.32, i.e., both with the 
same S/N. 

This confirms the results of BL04. More importantly, 
one can now use the result for this sample made of 10 
DLAs and 25 galaxies (with a surface density £ g ~ 0.05 
Mpc -2 ) as a benchmark to predict the S/N for the larger 
samples of future observations, given that the S/N will 
be proportional to yj Afgai Adla fea. lBll|) . 

5. CONCLUSIONS 

Motivated by the fact that (1) the amplitude of 
the cross-correlation is a measurement of the mean 
DLA halo mass a nd (2) observational constraints 
lIGawiser et al.ll200lt ASSP03; BL04; J. Cooke et al. 
2005, private communication) are reaching a turning 
point and the DLA halo masses are starting to be con- 
strained, we tested the cross-correlation technique using 
TreeSPH cosmological simulations. The method uses the 
ratio of the cross-correlation between DLAs and high- 
redshift galaxies to the autocorrelation of the galaxies 
themselves, which is (in linear theory) the ratio of their 
bias factor, to infer the mean DLA halo mass. 

In a T reeSPH simulation <)Katz et alJ Il996a() paral- 
lelized bv lDave et all 1^997) with 128 3 particles in a vol- 
ume 22.222 3 h~ 3 Mpc 3 (comoving), we find the following: 

1. Scales rg > 1-15 Mpc are the most relevant 
scales to constrain the mean DLA halo mass using 
the projected cross-correlation wa g {re). 

2. The DLA-galaxy cross-correlation has an ampli- 
tude u>d g = (0.73 ± 0.08) ?x w, close to the pre- 
dicted value of 0.771 using the lMo fc White! l|2002T) 
formalism. 

3. The inferred mean DLA halo mass is 

(logM DLA (M )) =11.13^, (18) 

in excellent agreement with the true values of the 
simulations, i.e., (logAfoLA) = 11-16. Thus, even 
though DLAs and galaxies occupy a broad range 
of halos with massive halos containing multiple 
galaxies with DLAs, the cross-correlation technique 
yields the first moment of the DLA halo mass dis- 
tribution. 

4. If we consider subsets of the simulated galax- 
ies with higher star-formation rates (represent- 
ing LBGs), the cross-correlation technique is self- 
consistent, i.e., the DLA mass inferred from the 
ratio of the correlation functions does not depend 



on the galaxy sample used. This demonstrates the 
reliability of the method. 

5. For real z — 3 LBGs with a c orrelation length 
r n gg ~ 4 h~ l Mpc (ASSP03; lAdelherger etal 
2004), our results imply that the DLA-LBG cross- 
correlation is expected to have a correlation length 
r , dg ^ 2.85 hr x Mpc. 

6. With small samples (with 10 lines of sight and 
25 galaxies) matching the statistics of BL04, the 
relative amplitude of the cross-correlation is a = 
0.77±0.53, i.e., with a signal-to-noise ratio (S/N~ 
1.3-1.5) comparable to BL04, where they found 
a = 1.62 ± 1.32. 

In short, the cross-correlation between galaxies and 
DLAs is a powerful and self-consistent technique to con- 
strain the mean mass of DLAs, and we have demon- 
strated its reliability. Given the resolution limits of the 
simulation used here (M/, > 5.2 x 10 10 M ), our val- 
ues are strictly upper limits. These simulation results 
suggest that DLAs are expected to be less massive than 
z = 3 LBGs by a factor of at leas t ~ 4.8. 

Recently, Cassat aet al.l |2004) studied the morphol- 
ogy of if-selected galaxies at redshifts up to z — 2.5 and 
found tha t the l ate type fraction drops beyond z > 2. 
lErb et all l)2004[) show that the kinematics of 13 z > 2 
morphologically elongated galaxies are not consistent 
with those of an inclined disk. Furthermore, the virial 
mass of these galaxies is in the range of a few times 
10 10 Mq up to 5 x 10 10 M Q . These results and the ones 
presented here disfavor the presence of large, massive 
10 12 M Q disks at z > 2 and therefore the massive disk 
hypothesis for DLAs. 

While current observational samples are just starting 
to put constraints on Wd g /w gg for z — 3 DLAs — BL04 
found Wd g /w gg > at the 95% confidence level, and 
ASSP03 found < 1 at the 90% confidence level, allowing 
the mass range (logA/DLA) ~ 10-12 M Q — future ob- 
servations will be able to distinguish between models in 
which DLAs reside in low mass halos from those in which 
DLAs are massive disks occupying only high mass halos 
thanks to planned wide-field imagers. 
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APPENDIX 

A. CROSS-CORRELATION AND AUTOCORRELATION FUNCTIONS 

For a given absorber with galaxies distributed with dN/dz, one may think that the projected autocorrelation 

w P,ge( r 9 ) is proportional to / (^jj) dz while the cross-correlation Wd g (rg) is proportional to / (^j) dz. Thus, at 
first glance, their ratio is not very useful. Below we show the situation to be not so trivial. In this appendix, we 
merely connect results previously published to show that the amplitude of both w g g(r#) and u>dg( r 6i) are proportional 
to \/W z , where W z is the redshift width of the galaxy distribution (determined by the box size or by observational 
selections such as photometric techniques). 
First, we list some definitions and three results (eas. lAl(lA3|) that will be useful later. For a 3D correlation function 
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£(r) = (r/r ) -7 , the projected correlation function w p (r p ) is ijDavis & PeebleslllQSl 

/•CO /"CO 

w p( r p)= dy£(r v ,y) = dy^Jr^+y 2 ), 

J CO «/ oo 

= (r p ) 1 ^rJJJ 7 , (Al) 
where £(r p ,y) is the 3D correlation function decomposed along the line of sight ?/ and on the plane of the sky r p , i.e., 
f 2 = + r p- The parameter _ff 7 is in fact the beta function -B(a, 6) = J Q i"" 1 (1 — di evaluated with a = 1/2 
and6=(7-l)/2,i.e.,H 7 = B(l/2j 7 -l]/2)=r(l/2)r([ 7 -l]/2)/r(7/2). 

In appendix C of ASSP03, one finds the expected number of neighbors between rg — dr/2 and re + dr/2 within a 
redshift distance |A z |<r z 

1 



' z JO 



(re;<r z ) = - dl £( Jr 2 + I 2 ) , 



= ^-M 1 -^r^ 7 4(^^), (A2) 

where x = r 2 /(r 2 + rf) and is the incomplete beta function B x (a. b) — J* t a ~ x (1 — £) h ~ x dt normalized by B(a, b): 
Ija,b) = Bja, b)/B(a,b). 

Many papers (Phillip ps et al.lll978t IPeeblesI H9931 iBudavari et al.ll2003j) have shown that the angular correlation 
function is 

r°° /r!/V\ 2 

,(6) = (d) 1 -y rlH.y. ^ d^-j .(.J"^, (A3) 

where g(z) — dr/dz = c/H(z) and f(z) = D c {z) is the comoving line-of-sight distance to redshift z, i.e., D c (z) — 
f*dt(c/H(t)). 

Equation IA3I can b e derived from the definitions of the angular and 3D correlation functions, w(&) and £(r) (e.g., 
Philli pps et al.lll978|) . We reproduce the derivation here and extend it to projected auto- and cross-correlation func- 
tions. The probabilities of finding a galaxy in a volume dVi and another in a volume dV 2 at a distance r = |r2 — 
along two lines of sight separated by 8 are 

dP(9)=Af 2 dCl 1 dn 2 [l + 'w(9)], (A4) 

(A5) 

or 

dP{r)=n{zf dVW 2 [l + £(0], (A6) 

where Af is the number of galaxies per solid angle, i.e., dN/dfl, and n(z) is the number density of galaxies, which can 
be a function of redshift. Given that N = 1/dfi J n(z)dV(z) and that dV = f 2 (z)g(z) dfldz, N = J dz (dN/dz) = 
jdzn(z)f 2 (z)g(z). 

To relate w(9) and £(r), one needs to inte grate eg uation I A6I over all possible lines-of-sight separated by 8 (i.e., along 
2i and z-x) and equate it with equation IA4I 

/>oo 

M 2 \\ + w{9)} = I dz 1 f(z 1 ) 2 g(z 1 )n(z 1 ) ■ 

dz 2 f{z 2 ) 2 g{z 2 )n{z 2 )[l+£,{r 12 )] , (A7) 



In the regime of small angles, the distance r 12 (in comoving Mpc) can be approximated by 
r i2 = r i + r i - 2r i r 2 cos0~ (n - r 2 ) 2 + r 2 8 2 with r = Tl ^ T2 , 

c(g( Z )( Zl -Z 2 )) 2 +f(z) 2 e 2 withz = ^±^, 

^g(z) 2 y 2 + f{z) 2 9 2 with y = Zl - z 2 . (A8) 

Changing vari ables in equation IA7I from (zi,z 2 ) to (z, y), assuming the the major contribution is from z\ ~ z 2 and 
using equation IA8I the angular correlation function is 



f °° dzf{zfg{z) 2 n{z) 2 dy^f{zf9 2 + g(z) 2 y 2 ) 

ww) = 5 . (A9) 

[Jo" dzP(z)g(z)n(zf 

Changing variables to I = g(z)y, using eouation lAll and using a normalized redshift distribution, i.e., J dz (dN/dz) = 1, 
equation IA9I becomes 

r°° /r\N\ 2 

w{9) = J dz i^—j g(z)- 1 x [f(z)8] 1 ^ r 7 H, , (A10) 
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which leads to equation IA3I (eg. 9 in lBudavari et aLll2003[) and is one version of Limber's equations. 

In this paper, we measured the projected autocorrelation of the LRGs, w gg (re), where re = f(z)9. 3 Following the 
same steps as above with re instead of 9, and dU = {dre) 2 g(z)dz, w gg (re) is 

wM^r^rZ^H- dz(^f) 2 9 {z)^ . (All) 



(i 



V dz J 



In the case of the projected cross-correlation, w dg (re), the conditional probabil ity of finding a g alaxy in the volume 
dV2 given that there is an absorber at a known position ri is, by definition (e.g., Eisenstein 2003), 

dP(2|l)(r e ) =W g dn a [l + w dg (r e )} , (A12) 
dP(2|l)(r) =n e (z) dV 2 [l + £ dg (r)] ■ (A13) 

Using the same approximations fea. IA8|I and one integral along the line of sight z 2 (keeping the absorber at z\), one 
finds that the projected cross-correlation is: 

poo 

w dg (r e ) = / dz2f(z 2 ) 2 g(z2)n(z 2 )^d g {ri2) , 



o 



^jj) £dg (^rl+g{zf{z 1 -Z2) 2 ^ , 







°dl^ ds (^r$+P), (A14) 

^^{ref^rl^H,, (A15) 

where we approximated dN /dz with a normalized top- hat of width W z = 2 r z , used equation IA2I and the fact that 
I x ~ 1, since x = r 2 /(r 2 + r 2 ,) ~ 1 for a redshift width W z of 20 hr 1 Mpc and re — 1 h^ 1 Mpc. 4 Thus, as one would 
have expected, the cross- corre lati on is i nversely proportional to the width of the galaxy distribution. 

Naturally, in equations I A 1 21 and I A 1 31 the redshift of galaxy 1 (i.e., the absorber) is assumed to be known with good 
precision. If the absorber population had poorly known redshifts, one would need to add an integral to eouation lA14l 
washing out the cross-correlation signal furth er. 

For the projected autocorrelation (eq. lAlljl , if one approximates 4j by a top- hat function of width W z , then 

r°° fdN\ 2 
w sg (r B ) = (re) 1 ^rl gx H 1 x / dzg(z) — g(z)- 2 , 



2 



W z x(r e )^rl m H y , (A16) 

which shows that the autocorrelation depends on the redshift distribution of the galaxies in the same way as the 
cross-correlation, i.e., oc 1/W Z . The reason for this is that the redshift distribution d N/dz has a ve ry different role 
with respect to the correlation functions, which can be seen by comparing equation IA11I and I AMI It is this very 
different role that leads to the same 1/W Z dependence. 

In the case of a Gaussian redshift distribution dN/dz, the ratio of cross- and autocorrelations may not be exactly 
T'o.dg/fn.gg if the approximation leading to A14 breaks down. Using mock galaxy samples (from the GIF2 collaboration, 
IGao et al1l2004|) selected in a redshift slice of width, W z , equal to their artificial Gaussian redshift errors cr z , we find 
that the cross-correlation is overestimated by 25% ± 10%. This correction factor is independent of the width of the 
redshift distribution as long as a z ~ W z or as long as it is Gaussian. This implies that the ratio of the correlation 
functions (w dg /w gg ) will be insensitive to errors in photometric redshifts. 

B. THE ERRORS TO CORRELATION FUNCTIONS 
In this appendix, we list the basic properties of the errors to correlation functions. 

From the definition of the cross-correlation ^ 5 shown in equation ^ the expected number of galaxies in a cell of 
volume AV centered on a DLA is given by the counts of neighbor galaxies: 

(N oha )=N(l + Z dg (r)), (Bl) 

3 In general this should be D^(l + z)0 where is the angu lar distance . For a flat universe, Da(1 + z) = Dm = Dc = f(z) where Dm 
is the comoving transverse distance, using D. Hogg's notations (Hogg 1 993) . 

4 The incomplete beta function I x = 0.94 for W z = 22.222 h -1 Mpc and rg = 1 h' 1 Mpc. 
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where N = n u AV. 

Various text books ( e . g. . iPeeblesll 1 98(1 section 36) have shown that the variance of the number of neighbor galaxies 
N b s near a DLA is the sum of the shot noise, 

V(N ohs ) sn = N ohs , (B2) 
and the clustering variance V(N ODS ) c i. The clustering variance is itself the sum of the two terms, Vzpt and V-jpt, 



V(N ohe ) 3pt =N 2 ' 



AV JAV 



£gg(|?"2-ri|). 



(B3) 



V(N ohs ) 3pt =N 2 ' 



, v,-,. , , [Cdg g (ri,r 2 )-Cd g (ri)e dg (r 2 )]dt/ 1 dy 2 , (B4) 

I^AV ) JAV JAV 

where N = n u AV = N cxpy £ gg is the galaxy-galaxy autocorrelation, and Cw is th e three-point correlation function. 
The function £ can be written as a product of two-point correlations l)Peebleslll980l) . 

Cdg g (ri,r 2 )=Q [£ dg (>i)£dg(V 2 ) +£d g (>i)£gg(|ri - r 2 |) + £d g (r 2 )£gg(l r i ~ r 2l)]> ( B5 ) 
where r x = \r a + r|, r 2 = \r a + r 2 | and r 12 =Jri - r 2 |. 
For a spherical volume AV, the integrals IB3I and IB4I can be written as (using the results in lPeeble^ll98 0. § 59): 

Lg, (B6) 



V(N ohs ) 2pt = N 2 - 



7 J 2 =N 2 fJ2 



Ki 



V{N ohs ) 3pt = N 2 



-N 2 



Q [K 



2j 



K 2 - - 

Q Kdg + 2 ^ 2 CggCdj 



' £de 



(B7) 



where £ = 1/AV J^dV = (r /r)T J 2 = 72/[(3 - 7) (4 - 7) (6 - 7)2^], ^ = 3/(3 - 7), and K 2 can only be 
computed numerically. For 7 = 1.6, J 2 = 4.87, K\ — 2.14, and K 2 — 4. 
The variance of the estimator of £ can be computed analytically. From lLandv &: Szalavl (^993), it is 

v£)=v f/ Nohs 



V((N obs }) V((N rand ))(N ohs ) 2 
(iV rand )2 + 



(N imd ) 

V((N obs )) V((N rand )) 



<iVob S ) 2 ' (A^rand) 2 J ^ ' W ' (B8) 

The shot noise of the random sample in equation IB8I T^((A f ran d))/(^V ran( j) 2 , can be neglected because the random 
sample of galaxies is intentionally much larger than the sample of observed galaxies. Thus, the rms (1 a) of £dg is 



obs / 



(B9) 



where o"(Ar oba ) = jVp LA y/V(N b s ), and V^(iV bs) is given by the sum of eouations lB2IB4l If we approximate the clustering 

variance of iV obs (eqs.HSHEl by V cl = N 2 \A^ gg + Bf dg + C^ gg ), where A = J 2 /K 1} B = Q- 1, and C = 2QK 2 jK\ 
are constants feas. IB6HB7|) . then V(N obs ) becomes 

==y/V(N obB )sn + V(N obB )cl 



VNdla 

1 



(N ohs ) + N 2 (A£ m + BiL + C£ dg £ gg ), 



VNbla 

Therefore, the expected rms of the cross-correlation function a w , equation IB9I becomes 



(BIO) 



at 



VN^^iN^j 



obs/ 



or 



1 



1 



V^Vdla Vn \ 



-2 



1 



(Bll) 



(i+e dfl ) 

using equation IB1I This expression is proportional to = ^= as one might have expected. Thus, the noise in (£) 
goes as the inverse of the square root of the number of DLAs, Ndla, and as the inverse of the square root of the 
number of galaxies N in the cell of volume AV. 
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